32 research outputs found

    Nonnegative matrix factorization with polynomial signals via hierarchical alternating least squares

    No full text
    Nonnegative matrix factorization (NMF) is a widely used tool in data analysis due to its ability to extract significant features from data vectors. Among algorithms developed to solve NMF, hierarchical alternating least squares (HALS) is often used to obtain state-of-the-art results. We generalize HALS to tackle an NMF problem where both input data and features consist of nonnegative polynomial signals. Compared to standard HALS applied to a discretization of the problem, our algorithm is able to recover smoother features, with a computational time growing moderately with the number of observations compared to existing approaches

    Joint optimization of predictive performance and selection stability

    No full text
    Current feature selection methods, especially applied to high dimensional data, tend to suffer from instability since marginal modifications in the data may result in largely distinct selected feature sets. Such instability strongly limits a sound interpretation of the selected variables by domain experts. We address this issue by optimizing jointly the predictive accuracy and selection stability and by deriving Pareto-optimal trajectories. Our approach extends the Recursive Feature Elimination algorithm by enforcing the selection of some features based on a stable, univariate criterion. Experiments conducted on several high dimensional microarray datasets illustrate that large stability gains are obtained with no significant drop of accuracy

    Improving accuracy by reducing the importance of hubs in nearest-neighbor recommendations

    No full text
    A traditional approach for recommending items to persons consists of including a step of forming neighborhoods of users/items. This work focuses on such nearest-neighbor approaches and, more specically, on a particular type of neighbors, the ones frequently appearing in the neighborhoods of users/items (i.e., very similar to many other users/items in the data set), referred to as hubs in the literature. The aim of this paper is to explore through experiments how the presence of hubs aects the accuracy of nearest-neighbor recommendations

    Compressive Learning of Generative Networks

    No full text
    Generative networks implicitly approximate complex densities from their sampling with impressive accuracy. However, because of the enormous scale of modern datasets, this training process is often computationally expensive. We cast generative network training into the recent framework of compressive learning: we reduce the computational burden of large-scale datasets by first harshly compressing them in a single pass as a single sketch vector. We then propose a cost function, which approximates the Maximum Mean Discrepancy metric, but requires only this sketch, which makes it time-and memory-efficient to optimize

    Survival Analysis with Cox Regression and Random Non-linear Projections

    No full text
    Proportional Cox hazard models are commonly used in survival analysis, since they define risk scores which can be directly interpreted in terms of hazards. Yet they cannot account for non-linearities in their covariates. This paper shows how to use random non-linear projections to efficiently address this limitation

    Lower bounds on the nonnegative rank using a nested polytopes formulation

    No full text
    Computing the nonnegative rank of a nonnegative matrix has been proven to be, in general, NP-hard. However, this quantity has many interesting applications, e.g., it can be used to compute the ex- tension complexity of polytopes. Therefore researchers have been trying to approximate this quantity as closely as possible with strong lower and upper bounds. In this work, we introduce a new lower bound on the nonnegative rank based on a representation of the matrix as a pair of nested polytopes. The nonnegative rank then corresponds to the minimum num-er of vertices of any polytope nested between these two polytopes. Using the geometric concept of supporting corner, we introduce a parametrized family of computable lower bounds and present preliminary numerical results on slack matrices of regular polygons

    The Sum-over-Forests clustering

    No full text
    This work introduces a novel way to identify dense regions in a graph based on a mode-seeking clustering technique, relying on the Sum-Over-Forests (SoF) density index (which can easily be computed in closed form through a simple matrix inversion) as a local density estimator. We first identify the modes of the SoF density in the graph. Then, the nodes of the graph are assigned to the cluster corresponding to the nearest mode, according to a new kernel, also based on the SoF framework. Experiments on artificial and real datasets show that the proposed index performs well in nodes clustering

    Image completion via nonnegative matrix factorization using HALS and B-splines

    No full text
    When performing image completion, it is common to assume that images are smooth and low-rank, when viewed as matrices of pixel intensities. In this work, we use nonnegative matrix factorization to suc- cessively refine the image by representing alternatively rows and columns as smooth signals using splines. Previous work solved this model using an alternating direction method of multipliers. Instead, we propose to use a version of the hierarchical alternating least squares algorithm adapted to handle splines, and show in numerical experiments that it outperforms the existing method. Performance can be further improved by increasing progressively the size of used splines. We also introduce a non iterative algorithm using the same NMF approach, where factorization is computed in a fast and accurate way but for which convergence is harder to achieve

    Finding the most interpretable MDS rotation for sparse linear models based on external features

    No full text
    One approach to interpreting multidimensional scaling (MDS) embeddings is to estimate a linear relationship between the MDS dimensions and a set of external features. However, because MDS only preserves distances between instances, the MDS embedding is invariant to rotation. As a result, the weights characterizing this linear relationship are arbitrary and difficult to interpret. This paper proposes a procedure for selecting the most pertinent rotation for interpreting a 2D MDS embedding
    corecore